SgmlV1, Main, Exploration, bibRecord, 001C01

Automatic Tagging of Compound Verb Groups in Czech Corpora

Identifieur interne : 001C01 ( Main/Exploration ); précédent : 001C00; suivant : 001C02

Automatic Tagging of Compound Verb Groups in Czech Corpora

Auteurs : Eva Žá Ková [République tchèque] ; Luboš Popelínsk [République tchèque] ; Miloslav Nepil [République tchèque]

Source :

Lecture Notes in Computer Science [ 0302-9743 ]

RBID : ISTEX:5986FD9D5B5AAF48236DA0482A5B726C251FF4C2

Abstract

Abstract: In Czech corpora, compound verb groups are usually tagged in a word-by-word manner. As a consequence, some of the morphological tags of particular components of the verb group loose their original meaning. We present an improved method for automatic synthesis of verb rules. These rules describe all compound verb groups that are frequent in Czech. Using these rules, we can find compound verb groups in unannotated texts with high accuracy. The system for tagging compound verb groups in an annotated corpus that exploits the verb rules is described.

Url:

https://api.istex.fr/ark:/67375/HCB-FZCR209N-S/fulltext.pdf

DOI: 10.1007/3-540-45323-7_20

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 001778
to stream Istex, to step Curation: 001276
to stream Istex, to step Checkpoint: 001A30
to stream Main, to step Merge: 001C45
to stream Main, to step Curation: 001C01

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Automatic Tagging of Compound Verb Groups in Czech Corpora</title>
<author><name sortKey="Za Kova, Eva" sort="Za Kova, Eva" uniqKey="Za Kova E" first="Eva" last="Žá Ková">Eva Žá Ková</name>
</author>
<author><name sortKey="Popelinsk, Lubos" sort="Popelinsk, Lubos" uniqKey="Popelinsk L" first="Luboš" last="Popelínsk">Luboš Popelínsk</name>
</author>
<author><name sortKey="Nepil, Miloslav" sort="Nepil, Miloslav" uniqKey="Nepil M" first="Miloslav" last="Nepil">Miloslav Nepil</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:5986FD9D5B5AAF48236DA0482A5B726C251FF4C2</idno>
<date when="2000" year="2000">2000</date>
<idno type="doi">10.1007/3-540-45323-7_20</idno>
<idno type="url">https://api.istex.fr/ark:/67375/HCB-FZCR209N-S/fulltext.pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001778</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001778</idno>
<idno type="wicri:Area/Istex/Curation">001276</idno>
<idno type="wicri:Area/Istex/Checkpoint">001A30</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">001A30</idno>
<idno type="wicri:doubleKey">0302-9743:2000:Za Kova E:automatic:tagging:of</idno>
<idno type="wicri:Area/Main/Merge">001C45</idno>
<idno type="wicri:Area/Main/Curation">001C01</idno>
<idno type="wicri:Area/Main/Exploration">001C01</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Automatic Tagging of Compound Verb Groups in Czech Corpora</title>
<author><name sortKey="Za Kova, Eva" sort="Za Kova, Eva" uniqKey="Za Kova E" first="Eva" last="Žá Ková">Eva Žá Ková</name>
<affiliation wicri:level="3"><country xml:lang="fr">République tchèque</country>
<wicri:regionArea>NLP Laboratory, Faculty of Informatics, Masaryk University, Botanická 68, CZ-60200, Brno</wicri:regionArea>
<placeName><settlement type="city">Brno</settlement>
<region>Moravie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">République tchèque</country>
</affiliation>
</author>
<author><name sortKey="Popelinsk, Lubos" sort="Popelinsk, Lubos" uniqKey="Popelinsk L" first="Luboš" last="Popelínsk">Luboš Popelínsk</name>
<affiliation wicri:level="3"><country xml:lang="fr">République tchèque</country>
<wicri:regionArea>NLP Laboratory, Faculty of Informatics, Masaryk University, Botanická 68, CZ-60200, Brno</wicri:regionArea>
<placeName><settlement type="city">Brno</settlement>
<region>Moravie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">République tchèque</country>
</affiliation>
</author>
<author><name sortKey="Nepil, Miloslav" sort="Nepil, Miloslav" uniqKey="Nepil M" first="Miloslav" last="Nepil">Miloslav Nepil</name>
<affiliation wicri:level="3"><country xml:lang="fr">République tchèque</country>
<wicri:regionArea>NLP Laboratory, Faculty of Informatics, Masaryk University, Botanická 68, CZ-60200, Brno</wicri:regionArea>
<placeName><settlement type="city">Brno</settlement>
<region>Moravie</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">République tchèque</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s" type="main" xml:lang="en">Lecture Notes in Computer Science</title>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In Czech corpora, compound verb groups are usually tagged in a word-by-word manner. As a consequence, some of the morphological tags of particular components of the verb group loose their original meaning. We present an improved method for automatic synthesis of verb rules. These rules describe all compound verb groups that are frequent in Czech. Using these rules, we can find compound verb groups in unannotated texts with high accuracy. The system for tagging compound verb groups in an annotated corpus that exploits the verb rules is described.</div>
</front>
</TEI>
<affiliations><list><country><li>République tchèque</li>
</country>
<region><li>Moravie</li>
</region>
<settlement><li>Brno</li>
</settlement>
</list>
<tree><country name="République tchèque"><region name="Moravie"><name sortKey="Za Kova, Eva" sort="Za Kova, Eva" uniqKey="Za Kova E" first="Eva" last="Žá Ková">Eva Žá Ková</name>
</region>
<name sortKey="Nepil, Miloslav" sort="Nepil, Miloslav" uniqKey="Nepil M" first="Miloslav" last="Nepil">Miloslav Nepil</name>
<name sortKey="Nepil, Miloslav" sort="Nepil, Miloslav" uniqKey="Nepil M" first="Miloslav" last="Nepil">Miloslav Nepil</name>
<name sortKey="Popelinsk, Lubos" sort="Popelinsk, Lubos" uniqKey="Popelinsk L" first="Luboš" last="Popelínsk">Luboš Popelínsk</name>
<name sortKey="Popelinsk, Lubos" sort="Popelinsk, Lubos" uniqKey="Popelinsk L" first="Luboš" last="Popelínsk">Luboš Popelínsk</name>
<name sortKey="Za Kova, Eva" sort="Za Kova, Eva" uniqKey="Za Kova E" first="Eva" last="Žá Ková">Eva Žá Ková</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Informatique/explor/SgmlV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001C01 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001C01 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Informatique
   |area=    SgmlV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:5986FD9D5B5AAF48236DA0482A5B726C251FF4C2
   |texte=   Automatic Tagging of Compound Verb Groups in Czech Corpora
}}

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jul 1 14:26:08 2019. Site generation: Wed Apr 28 21:40:44 2021

	Serveur d'exploration sur SGML
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur SGML

Automatic Tagging of Compound Verb Groups in Czech Corpora

Automatic Tagging of Compound Verb Groups in Czech Corpora

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri